Clustering with Propagated Constraints
نویسنده
چکیده
Title of Thesis: Clustering with Propagated Constraints Eric Robert Eaton, Master of Science, 2005 Thesis directed by: Dr. Marie desJardins, Assistant Professor Department of Computer Science and Electrical Engineering Background knowledge in the form of constraints can dramatically improve the quality of generated clustering models. In constrained clustering, these constraints typically specify the relative cluster membership of pairs of points. They are tedious to specify and expensive from a user perspective, yet are very useful in large quantities. Existing constrained clustering methods perform well when given large quantities of constraints, but do not focus on performing well when given very small quantities. This thesis focuses on providing a high-quality clustering with small quantities of constraints. It proposes a method for propagating pairwise constraints to nearby instances using a Gaussian function. This method takes a few easily specified constraints, and propagates them to nearby pairs of points to constrain the local neighborhood. Clustering with these propagated constraints can yield superior performance with fewer constraints than clustering with only the original user-specified constraints. The experiments compare the performance of clustering with propagated constraints to that of established constrained clustering algorithms on several real-world data sets.
منابع مشابه
Scalable Active Temporal Constrained Clustering
We introduce a novel interactive framework to handle both instance-level and temporal smoothness constraints for clustering large temporal data. It consists of a constrained clustering algorithm which optimizes the clustering quality, constraint violation and the historical cost between consecutive data snapshots. At the center of our framework is a simple yet effective active learning techniqu...
متن کاملValue, Cost, and Sharing: Open Issues in Constrained Clustering
Clustering is an important tool for data mining, since it can identify major patterns or trends without any supervision (labeled data). Over the past five years, semi-supervised (constrained) clustering methods have become very popular. These methods began with incorporating pairwise constraints and have developed into more general methods that can learn appropriate distance metrics. However, s...
متن کاملGenerating Optimal Timetabling for Lecturers using Hybrid Fuzzy and Clustering Algorithms
UCTTP is a NP-hard problem, which must be performed for each semester frequently. The major technique in the presented approach would be analyzing data to resolve uncertainties of lecturers’ preferences and constraints within a department in order to obtain a ranking for each lecturer based on their requirements within a department where it is attempted to increase their satisfaction and develo...
متن کاملSemi-supervised clustering via multi-level random walk
A key issue of semi-supervised clustering is how to utilize the limited but informative pairwise constraints. In this paper, we propose a new graph-based constrained clustering algorithm, named SCRAWL. It is composed of two random walks with different granularities. In the lower-level random walk, SCRAWL partitions the vertices (i.e., data points) into constrained and unconstrained ones, accord...
متن کاملMultiresolution genetic clustering algorithm for texture segmentation
This work plans to approach the texture segmentation problem by incorporating genetic algorithm and K-means clustering method within a multiresolution structure. As the algorithm descends the multiresolution structure, the coarse segmentation results are propagated down to the lower levels so as to reduce the inherent class–position uncertainty and to improve the segmentation accuracy. The proc...
متن کامل